-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enh/take along axis #11076
base: main
Are you sure you want to change the base?
Enh/take along axis #11076
Conversation
Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>
Can one of the admins verify this patch? Admins can comment |
edit: added benchmark for sparse implementation.
from dask_take_along_axis import dask_take_along_axis # copy-pasted module from climix
%timeit dask_take_along_axis(dask_random_arr, from_array(top50_np), axis=0).compute()
6.33 s ± 60.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
|
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 3h 29m 26s ⏱️ + 7m 10s For more details on these failures, see this check. Results for commit 105ef10. ± Comparison against base commit dafb6ac. |
Is there someone available to review/comment on this PR ? |
Sorry that I don't have the bandwidth to interact properly :/ One thing that concerns me a little is that the test/benchmark isn't really testing much dask because it is conducted with a dask array that consists of a single numpy array as it's only chunk. In that case, performance would probably be expected to be slightly worse than numpy, but it's also not really representative. Particularly, the point of the sparse array, iirc, was that there is one array for every chunk that covers all the chunks. If this is sparse, that's ok (or at least I thought so when writing it), but it is prohibitive if dense, so I don't think replacing the use of sparse with dense matrices is the way to go. If we want to avoid the use of sparse as a dependency, the aggregation step needs to be re-thought. As a practical step forward, @bzah, could you do a larger test? Why not apply this to some larger climate data in at least a local cluster with several workers? I suspect that perhaps more than compute time, memory requirements without the use of sparse would balloon. |
@quasiben maybe this is something your team could review? |
This PR adds
take_along_axis
that works similarly to numpy's take_along_axis.pre-commit run --all-files
Credit
@zklaus for providing a working solution in climix.
Example of use
Performances
This is just a basic benchmark I ran on my laptop to compare how this implementation prerforms against numpy's.